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The MAILING DATE of this communication appears on the cover sheet with the correspondence address -- 
Period for Reply 



A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1.136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If the period for reply specified above is less than thirty (30) days, a reply within the statutory minimum of thirty (30) days will be considered timely. 

- If NO period for reply is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1 .704(b). 

Status 

1 )S Responsive to connmunication(s) filed on 22 November 2004 . 
2a)[3 This action is FINAL. 2b)n This action is non-final. 

3) 0 Since this application is in condition for allowance except for fornnal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex pa/te Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) IEI Claim(s) 1-6,8-23 and 25-35 is/are pending in the application. 

4a) Of the above claim{s) is/are withdrawn from consideration. 

5) 0 Claim(s) is/are allowed. 

6) 12] Claim(s) 1-6, 8-23 and 25-35 is/are rejected. 
?)□ Claim(s) is/are objected to. 

8) 0 Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) n The specification is objected to by the Examiner. 

10) IEI The drawing(s) filed on 30 June 2000 is/are: a)^ accepted or b)n objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 
Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 

11) n The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 119 
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1 .□ Certified copies of the priority documents have been received. 

2.0 Certified copies of the priority documents have been received in Application No. . 

3.n Copies of the certified copies of the priority documents have been received in this National Stage 
application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 
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DETAILED ACTION 
Continued Examination Under 37 CFR 1.114 

1. A request for continued examination under 37 CFR 1.114, including the 
fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. 
Since this application is eligible for continued examination under 37 CFR 1.114, 
and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the 
previous Office action has been withdrawn pursuant to 37 CFR 1.114. 
Applicant's submission filed on 1 1/22/04 has been entered. 

2. The proposed claim attached to the Letter Requesting Interview with the 
Examiner, 1 1/02/04, will not be entered in as an amendment due to improper 
format. 

Response to Arguments 

3. In an Interview with the Applicant's representative, Mark Farrell, January 
25, 2005, an agreement was not reached concerning the proposed and amended 
claims. 

Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for 

all obviousness rejections set forth in this Office action: ' 

(a) A patent may not be obtained though the invention is not identically 
disclosed or described as set forth in section 102 of this title, if the 
differences between the subject matter sought to be patented and the 
prior art are such that the subject matter as a whole would have been 
obvious at the time the invention was made to a person having ordinary 
skill in the art to which said subject matter pertains. Patentability shall not 
be negatived by the manner in which the invention was made. 
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5. Claims 1-35 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Ramaswamy et al. (U.S. Patent No. 6,188,976 filed Oct. 23, 1998) in view of 
Bangalore et al. (U.S. Patent No. 6,317.707 filed Dec. 7,1998). 

Ramaswamy et al. and Bangalore et al. are analogous art in that they both 
deal with language modeling. 

As per claims 1, 18, 19, 20, 27 and 28 Ramaswamy et al. discloses a 
method comprising: 

developing a language model from a tuning set of information (C.2. lines 
44-48); 

segmenting at least a subset of received textual corpus by clustering 
every N-items of the received corpus into a training unit, wherein resultant 
training units are separated by gaps (C.6.line 67, C.7. lines 1, 2-the separate 
classes inherently includes gaps); 

calculating a perplexity value for each segment (C.4. lines 13-20-the 
external corpus is segmented into linguistic units and a perplexity value is 
calculated for each unit); and 

refining the language model with one or more segments of the received 
corpus based, at least in part, on the calculated perplexity value for the one or 
more segments (C. 3. lines 47-52, C.4. lines 45-47-the language model is updated 
based upon the perplexity value). 

Ramaswamy et al. does not disclose: 

calculating the similarity within a sequence of training chunks on either 
side of each of the gaps; and 
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selecting segment boundaries that maximize intra-segment similarity and 
inter-segment disparity. 

However, Bangalore et al. teaches calculating the similarity within a 
sequence of training chunks (Fig. 2, the space between clusters representing the 
gaps, C.S.Iines 4-12, 15-18, 22, 23-the calculated radius determines the 
similarity, C.4. lines 8-10-provides the calculation) and selecting segment 
boundaries that maximize intra-segment similarity and inter-segment disparity 
(C.S.Iines 15, 16-the radius indicates the selected boundaries and compactness 
maximizes segment similarity and inter-segment disparity, C.4. lines 7-15, the 
manipulative step of selecting the boundary that maximizes/improves intra- 
segment similarity and inter-segment disparity is found in selecting "tighter 
clusters" to occur first in a list, the tighter the cluster defined by the boundary- 
radius-compactness will improve the intra-segment similarity and simultaneously 
improve the inter-segment disparity, the ranking involves selecting these 
maximized/improved clusters based on the boundary/radius/compactness). 
Therefore, it would have been obvious at the time of the invention to one 
ordinarily skilled in the art to combine Ramaswamy et al. with Bangalore et al. 
The motivation for doing so would have been to provide, by language modeling, 
(Bangalore C.2. lines 8-15) lexical significancy, and the ability to provide insight of 
language model to grammar (Bangalore et al. C.2.lines 14, 15), which through 
the vector representation in a geometrical space and relation to context words 
(C.2.lines 8-13, 59-67), creates indexed lexically significant clusters (Bangalore 
et al. C.4.lines 15-17). 
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As per claim 2, 21 , Ramaswamy et al. and Bangalore et al, disclose all of 
the limitations of claim 1, upon which claim 2 depends. Ramaswamy et al. further 
discloses: 

the tuning set of information (C.S.Iines 36-38-test corpus) is application 
specific (C.S.Iines 40-44-the application is speech recognition). 

As per claim 3, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 1 , upon which claim 3 depends. Ramaswamy et al. further 
discloses: 

the tuning set of information is comprised of one or more application- 
specific documents (C. 7. lines 6-9,-the application is e-mail, the documents 
comprise "show me the next e-mail...") 

As per claim 4, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 1 , upon which claim 4 depends. Ramaswamy et al. further 
discloses: 

the tuning set of information is a highly accurate set of textual information 
linguistically relevant to (C.2. lines 55-58), but not taken from, the received textual 
corpus (C. 3. lines 14-18, the received corpus-external corpus comprises many 
domains, however the seed corpus is linguistically related, but not taken from the 
external corpus). 

As per claim 5, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 1 , upon which claim 5 depends. Ramaswamy et al. further 
discloses: 
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a training set comprised of at least the subset of the received textual 
corpus (C.S.Iines 6-8,1 4-1 7-test corpus is at least the subset of the received 
textual corpus). 

As per claim 6, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 5, upon which claim 6 depends. Ramaswamy et al. further 
discloses: 

ranking the segments of the training set based, at least in part, on the 
calculated perplexity value for each segment (C.4, lines 36-41 , C.8.lines 34-36). 

As per claim 8, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 1, upon which claim 8 depends. Ramaswamy et al. further 
discloses: 

the resultant segment defines a training chunk (C. 7. lines 14-18-the word 
class is the chunk that is then used in subsequent processing steps). 

As per claim 9, Ramaswamy et al. and Bangalore et al. disclose all of the 
limitations of claim 1 , upon which claim 9 depends. Ramaswamy et al. does not 
disclose: 

N is an empirically derived value based, at least in part, on the size of the 
received corpus. 

However, as it is well known in the art, Bangalore et al. teaches having an 
empirically derived N-vector for each item in the corpus, which thereby is based 
upon the size of the corpus (C.2. lines 59-65) and every item is included in the 
vector space (C. 3. lines 7,8, Fig 2), Therefore, at the time of the invention, it 
would have been obvious to one ordinarily skilled in the art to combine 
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Ramaswamy et al. with Bangalore et al. The motivation for doing so would have 
been to include every item in the clustering process to better improve subsequent 
clustering results for determining the compactness of a cluster (Bangalore et al. 
C.2. lines 65, 66, C.3.line 1, C.4.lines 7-14). 

As per claim 10, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 1 , upon which claim 10 depends. Ramaswamy et al. does 
not disclose: 

the calculation of the similarity within a sequence of training units defines 
a cohesion score. 

However, Bangalore et al. teaches the calculation of the similarity within a 
sequence of training units (C. 3. lines 22, 23) defines a cohesion score (C. 3, lines 
15-19 "very close relationship." is interpreted as the cohesion). Therefore, at the 
time of the invention, it would have been obvious to one ordinarily skilled in the 
art to combine Ramaswamy et al. with Bangalore et al. The motivation for doing 
so would have been to determine how close or similar the training units were to 
each other for the benefit of maximizing the clustering process of related items 
(C.4Jines 12, 13). 

As per claim 11, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 10, upon which claim 11 depends. Ramaswamy et al. 
does not disclose: 

intra-segment similarity is measured by the cohesion score. 

However, Bangalore et al. teaches intra-segment similarity is measured by 
the cohesion score (C.3.lines 15-19, 22, 23), Therefore, at the time of the 
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invention, it would have been obvious to one ordinarily skilled in the art to 
combine Ramaswamy et al. with Bangalore et al. The motivation for doing so 
would have been to measure how close or similar the intra-segment training units 
were to each other for the benefit of maximizing the clustering process of related 
items (C. 3. lines 17-19, C.4. lines 13, 14), to better improve subsequent language 
modeling results. 

As per claim 12, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 10, upon which claim 12 depends. Ramaswamy et al. 
does not disclose: 

inter-segment disparity is approximated from the cohesion score. 

However, Bangalore et al. teaches inter-segment (C. 3. lines 24, 25-the 
different vector coordinates interpreted inter-segments) is approximated form the 
cohesion score (C.4, lines 35-45, Table 2-the "Compactness Value"-determines 
the score and cohesion and the "Class Index'-determines the inter-segment 
disparity resulting from the cohesion score). Therefore, at the time of the 
invention, it would have been obvious to one ordinarily skilled in the art to 
combine Ramaswamy et al. with Bangalore et al. The motivation for doing so 
would have been to determine how disparate or distinct the inter-segment 
training units were to each other for the benefit of maximizing the clustering 
process of related items (C.3.lines 15-19, C.4. lines 14, 15), to better improve 
subsequent language modeling results. 
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As per claim 13, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 1 , upon which claim 13 depends. Ramaswamy et al. does 
not disclose: 

the calculation of inter-segment disparity defines a depth score. 
However, Bangalore et al. teaches the calculation of inter-segment 
disparity defines a depth score (C.4. lines 12-16, 30-66-Table 2 the depth of the 
inter-segment disparity approximated form the cohesion score-compactness 
value- is indicated as the values are "deeper" as they are farther down the list). 
Therefore, at the time of the invention, it would have been obviousness to one 
ordinarily skilled in the art to combine Ramaswamy et al. with Bangalore et al. 
The motivation for doing so would have been to determine the depth of the 
disparity in a ranked manner to visually determine the relatedness of different 
classes or inter-segment disparity by index (C.4.Table 2-visual depth benefit). 

As per claim 14, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 1 , upon which claim 14 depends. Ramaswamy et al. 
further discloses: 

the perplexity value is a measure of the predictive power of a certain 
language model to a segment of the received corpus (C.4. lines 16-21), 

As per claim 15, Ramaswamy et al. and Bangalore et al, disclose all of 
the limitations of claim 1 , upon which claim 15 depends. Ramaswamy et al, 
further discloses: 
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ranking the segments of at least the subset of the received corpus based, 
at least in part, on the calculated perplexity value of each segment (C.4. lines 36- 
40, C.8. lines 34, 35); and 

updating the tuning set of information with one or more of the segments 
from at least the subset of the received corpus (C.4. lines 41-47). 

As per claim 16, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 15, upon which claim 16 depends. Ramaswamy et al. 
further discloses: 

one or more of the segments with the lowest perplexity value from at least 
the subset of the received corpus are added to the tuning set (C.4. lines 41-47- 
. .below the perplexity threshold . . ."). 

As per claims 17 and 25, Ramaswamy et al. and Bangalore et al. 
disclose all of the limitations of claim 1, upon which claim 17 depends. 
Ramaswamy et al. further discloses: 

utilizing the refined language model in an application (C.5.lines 40-42, the 
application is speech recognition) to predict a likelihood of another corpus 
(C. 5. lines 42-45-the likelihood is interpreted as the "accuracy... for the current 
language model"-the other corpus is the test corpus). 

As per claim 22, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 20, upon which claim 22 depends. Ramaswamy et al. 
does not disclose: 

the language model agent ranks the segments of the training set based, at 
least in part, on a measure of similarity between two or more segments. 
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However, Bangalore et al. teaches ranking the segments of a training set 
based on a measure of similarity (C.4.lines 9-16, compactness value between 
segments) between segments. Therefore, at the time of the invention, it would 
have been obvious to one ordinarily skilled in the art to combine Ramaswamy et 
al. with Bangalore et al. The motivation for doing so would have been to identify 
by a ranking system the segments of varied similarity measurements in order to 
maximize the clustering process (C.4. lines 13, 14) to further improve any 
successive language modeling resulting from using the provided clustering data 
(C.4.lines 20-29, Table 2). 

As per claim 23, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 22, upon which claim 23 depends. Ramaswamy et al. 
does not disclose: 

the similarity measure is calculated for adjacent segments. 

However, Bangalore et al. teaches having a similarity measure calculated 
for adjacent segments (C.2. lines 29-31 , C.2.lines 59-65, C.3.line 1 , C.3.lines 1 5- 
1 7). Therefore, at the time of the invention, it would have been obvious to 
combine Ramaswamy et al. with Bangalore et al. The motivation for doing so 
would have been to obtain similarity measurements of adjacent segments in 
order to maximize the clustering process to further improve any successive 
language modeling resulting from using the provided clustering data for the 
benefit of determining a grammatical model (C.4. lines 15-17, C.5.lines 37-47). 
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As per claim 26, Ramaswamy et al. and Bangalore et al. discJose all of 
the limitations of claim 25, upon which claim 26 depends. Ramaswamy et al. 
further discloses: 

the application is one or more of a spelling and/or grammar checker, a 
word-processor, a speech recognition application, a language translation 
application, and the like (C.5.lines 40-42, the application is speech recognition). 

As per claim 29, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 28, upon which claim 29 depends. Ramaswamy et al. 
further discloses: 

the tuning set is dynamically selected as relevant to the received corpus 
(C.a.lines 47-54). 

As per claim 30, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 28, upon which claim 30 depends. Ramaswamy et al. 
further discloses: 

a dynamic lexicon generation function, to develop an initial lexicon from 
the tuning set (C. 3. lines 42-44-the tuning set (seed corpus) is used to develop an 
initial lexicon (corpus)), and to update the lexicon with the select segments from 
the received corpus (C. 3. lines 50-55- "...adding linguistic units to relevant 
corpus'-the relevant corpus being the updated lexicon). 

As per claim 31, Ramaswamy et al. discloses all of the limitations of claim 
28, upon which claim 31 depends. Ramaswamy et al. does not disclose: 

a frequency analysis function, to determine a frequency of occurrence of 
segments within the received corpus. 
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However, Bangalore et al. teaches having a function based upon 
frequencies for each input word, which determines the frequencies of segments 
within the received corpus (C. 3. lines 45-47). Therefore, at the time of the 
invention, it would have been obvious to one ordinarily skilled in the art to 
combine Ramaswamy et al. with Bangalore et al. The motivation for doing so 
would have been to assist in building a cluster in the well known method of 
having a vector space to hold the clusters with the frequency of each segment 
being incorporated into the cluster for the benefit of maximizing the clustering 
segments (C.3. lines 18-20, 62, 63. C.4,lines 13, 14), to better improve 
subsequent language modeling results. 

As per claim 32, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 28, upon which claim 32 depends. Ramaswamy et al 
further discloses: 

a dynamic segmentation function (C.5.lines 1-3), to iteratively segment the 
received corpus (C.5.lines 1-3) to improve a predictive performance attribute of 
the modeling agent (C. 5. lines 6-9-"to improve language model quality..." 
comprising evaluating perplexity change which is interpreted as the predictive 
performance). 

As per claim 33, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 32, upon which claim 33 depends. Ramaswamy et al 
further discloses: 

the dynamic segmentation function iteratively re-segments the received 
corpus until the language model reaches an acceptable threshold (C. 5. lines 1,2, 
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9-15-the external corpus is segmented, iteratively by extracting linguistic units, 
until the language model is updated once a "...a certain number..." a threshold 
is reached). 

As per claim 34, Ramaswamy et al. discloses all of the limitations of claim 
32, upon which claim 34 depends. Ramaswamy et al. does not disclose: 

a frequency analysis function, to determine a frequency of occurrence of 
segments within the received corpus. 

However, Bangalore et al. teaches having a function based upon 
frequencies for each input word, which determines the frequencies of segments 
within the received corpus (C.2. lines 59, 60). Therefore, at the time of the 
invention, it would have been obvious to one ordinarily skilled in the art to 
combine Ramaswamy et al. with Bangalore et al. The motivation for doing so 
would have been to assist in building a cluster in the well known method of 
having a vector space to hold the clusters with the frequency of each segment 
being incorporated into the cluster for the benefit of maximizing the clustering 
segments (C.3.lines 18-20, 62, 63, C.4.lines 13, 14), to better improve 
subsequent language modeling results. 

As per claim 35, Ramaswamy et al. and Bangalore et al. disclose all of 
the limitations of claim 34, upon which claim 35 depends. Ramaswamy et al. 
further discloses: 

the data structure generator removes segments from the data structure 
that do not meet a minimum frequency threshold {C.4. lines 29-31 -it is well known 
that the relevancy of the segments is based in part on frequency), and 
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dynamically re-segments the received corpus to improve predictive capability 
while reducing the size of the data structure (C.S.Iines 1-3, C.S.Iines 6-9-"to 
improve language model quality..." comprising evaluating perplexity change 
which is interpreted as the predictive performance). 

Conclusion 

6. This is a continuation of applicant's earlier Application No. 09/607,786. All 
claims are drawn to the same invention claimed in the earlier application and 
could have been finally rejected on the grounds and art of record in the next 
Office action if they had been entered in the earlier application. Accordingly, 
THIS ACTION IS MADE FINAL even though it is a first action in this case. See 
MPEP § 706.07(b). Applicant is reminded of the extension of time policy as set 
forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire 
THREE MONTHS from the mailing date of this action. In the event a first reply is 
filed within TWO MONTHS of the mailing date of this final action and the advisory 
action is not mailed until after the end of the THREE-MONTH shortened statutory 
period, then the shortened statutory period will expire on the date the advisory 
action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be 
calculated from the mailing date of the advisory action. In no, however, event will 
the statutory period for reply expire later than SiX MONTHS from the mailing 
date of this final action. 
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7. Any inquiry concerning this communication or earlier communications from 
tlie examiner should be directed to Lament M Spooner whose telephone number 
is 703/305-8661 . The examiner can normally be reached on 8:00 AM - 5:00 PM. 

If attempts to reach the examiner by telephone are unsuccessful, the 
examiner's supervisor, Richemond Dorvil can be reached on 703/305-9645. The 
fax phone number for the organization where this application or proceeding is 
assigned is 703-872-9306. 

Infomnation regarding the status of an application may be obtained from 
the Patent Application Information Retrieval (PAIR) system. Status information 
for published applications may be obtained from either Private PAIR or Public 
PAIR. Status information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, see http://pair- 
direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- 
free). 



Ims 

02/15/2005 




/ reCHB^OMD DORVIL 
SUPERVISORY PATENT EXAMINER 



